Developer Source 7

home *** CD-ROM | disk | FTP | other *** search

/ Developer Source 7 / developer source - volume 7.iso / dobbs / mar97 / singf106.gif < prev next >

Tile

Graphics Interchange Format | 1997-06-26 | 26.0 KB | 244x290 | 4-bit (16 colors)

ocr: t+1 1+2 au 8t+1 8t+2 St+ L St+2 Figure 6: The program's. expertence consists ofa trajectory through state space. Al - time stept, the state iS S, and tbe agent faces a choice ofactions. Note tbe action the agent cbooses to execuse at. stept isa. The rewardat stept, Reward,, isafunction ofst anda, Ihe next state Si+1 depends ons ar and mary random events such as passengers arriving atfloors anapushing buttons. Reinforcement learning allous 3 program to se such - a trajectoryto incrementally improve its policy.